72 research outputs found
Generative Knowledge Selection for Knowledge-Grounded Dialogues
Knowledge selection is the key in knowledge-grounded dialogues (KGD), which
aims to select an appropriate knowledge snippet to be used in the utterance
based on dialogue history. Previous studies mainly employ the classification
approach to classify each candidate snippet as "relevant" or "irrelevant"
independently. However, such approaches neglect the interactions between
snippets, leading to difficulties in inferring the meaning of snippets.
Moreover, they lack modeling of the discourse structure of dialogue-knowledge
interactions. We propose a simple yet effective generative approach for
knowledge selection, called GenKS. GenKS learns to select snippets by
generating their identifiers with a sequence-to-sequence model. GenKS therefore
captures intra-knowledge interaction inherently through attention mechanisms.
Meanwhile, we devise a hyperlink mechanism to model the dialogue-knowledge
interactions explicitly. We conduct experiments on three benchmark datasets,
and verify GenKS achieves the best results on both knowledge selection and
response generation.Comment: Findings of EACL-2
Entity Linking for Queries by Searching Wikipedia Sentences
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset
A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts
End-to-end Task-oriented Dialogue Systems (TDSs) have attracted a lot of
attention for their superiority (e.g., in terms of global optimization) over
pipeline modularized TDSs. Previous studies on end-to-end TDSs use a
single-module model to generate responses for complex dialogue contexts.
However, no model consistently outperforms the others in all cases. We propose
a neural Modular Task-oriented Dialogue System(MTDS) framework, in which a few
expert bots are combined to generate the response for a given dialogue context.
MTDS consists of a chair bot and several expert bots. Each expert bot is
specialized for a particular situation, e.g., one domain, one type of action of
a system, etc. The chair bot coordinates multiple expert bots and adaptively
selects an expert bot to generate the appropriate response. We further propose
a Token-level Mixture-of-Expert (TokenMoE) model to implement MTDS, where the
expert bots predict multiple tokens at each timestamp and the chair bot
determines the final generated token by fully taking into consideration the
outputs of all expert bots. Both the chair bot and the expert bots are jointly
trained in an end-to-end fashion. To verify the effectiveness of TokenMoE, we
carry out extensive experiments on a benchmark dataset. Compared with the
baseline using a single-module model, our TokenMoE improves the performance by
8.1% of inform rate and 0.8% of success rate.Comment: Proceedings of the 2019 SIGIR Workshop WCIS: Workshop on
Conversational Interaction System
Towards Empathetic Dialogue Generation over Multi-type Knowledge
Enabling the machines with empathetic abilities to provide context-consistent
responses is crucial on both semantic and emotional levels. The task of
empathetic dialogue generation is proposed to address this problem. However,
lacking external knowledge makes it difficult to perceive implicit emotions
from limited dialogue history. To address the above challenges, we propose to
leverage multi-type knowledge, i.e, the commonsense knowledge and emotional
lexicon, to explicitly understand and express emotions in empathetic dialogue
generation. We first enrich the dialogue history by jointly interacting with
two-type knowledge and construct an emotional context graph. Then we introduce
a multi-type knowledge-aware context encoder to learn emotional context
representations and distill emotional signals, which are the prerequisites to
predicate emotions expressed in responses. Finally, we propose an emotional
cross-attention mechanism to exploit the emotional dependencies between the
emotional context graph and the target empathetic response. Conducted on a
benchmark dataset, extensive experimental results show that our proposed
framework outperforms state-of-the-art baselines in terms of automatic metrics
and human evaluations.Comment: arXiv admin note: text overlap with arXiv:1911.0869
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology
Conversational interfaces are increasingly popular as a way of connecting
people to information. Corpus-based conversational interfaces are able to
generate more diverse and natural responses than template-based or
retrieval-based agents. With their increased generative capacity of corpusbased
conversational agents comes the need to classify and filter out malevolent
responses that are inappropriate in terms of content and dialogue acts.
Previous studies on the topic of recognizing and classifying inappropriate
content are mostly focused on a certain category of malevolence or on single
sentences instead of an entire dialogue. In this paper, we define the task of
Malevolent Dialogue Response Detection and Classification (MDRDC). We make
three contributions to advance research on this task. First, we present a
Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled
multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical
classification task over this taxonomy. Third, we apply stateof-the-art text
classification methods to the MDRDC task and report on extensive experiments
aimed at assessing the performance of these approaches.Comment: under review at JASIS
Improving Background Based Conversation with Context-aware Knowledge Pre-selection
Background Based Conversations (BBCs) have been developed to make dialogue
systems generate more informative and natural responses by leveraging
background knowledge. Existing methods for BBCs can be grouped into two
categories: extraction-based methods and generation-based methods. The former
extract spans frombackground material as responses that are not necessarily
natural. The latter generate responses thatare natural but not necessarily
effective in leveraging background knowledge. In this paper, we focus on
generation-based methods and propose a model, namely Context-aware Knowledge
Pre-selection (CaKe), which introduces a pre-selection process that uses
dynamic bi-directional attention to improve knowledge selection by using the
utterance history context as prior information to select the most relevant
background material. Experimental results show that our model is superior to
current state-of-the-art baselines, indicating that it benefits from the
pre-selection process, thus improving in-formativeness and fluency.Comment: SCAI 2019 workshop pape
Improving End-to-End Sequential Recommendations with Intent-aware Diversification
Sequential Recommendation (SRs) that capture users' dynamic intents by
modeling user sequential behaviors can recommend closely accurate products to
users. Previous work on SRs is mostly focused on optimizing the recommendation
accuracy, often ignoring the recommendation diversity, even though it is an
important criterion for evaluating the recommendation performance. Most
existing methods for improving the diversity of recommendations are not ideally
applicable for SRs because they assume that user intents are static and rely on
post-processing the list of recommendations to promote diversity. We consider
both recommendation accuracy and diversity for SRs by proposing an end-to-end
neural model, called Intent-aware Diversified Sequential Recommendation (IDSR).
Specifically, we introduce an Implicit Intent Mining module (IIM) into SRs to
capture different user intents reflected in user behavior sequences. Then, we
design an Intent-aware Diversity Promoting (IDP) loss to supervise the learning
of the IIM module and force the model to take recommendation diversity into
consideration during training. Extensive experiments on two benchmark datasets
show that IDSR significantly outperforms state-of-the-art methods in terms of
recommendation diversity while yielding comparable or superior recommendation
accuracy
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
Evaluating open-domain dialogue systems is challenging for reasons such as
the one-to-many problem, i.e., many appropriate responses other than just the
golden response. As of now, automatic evaluation methods need better
consistency with humans, while reliable human evaluation can be time- and
cost-intensive. To this end, we propose the Reference-Assisted Dialogue
Evaluation (RADE) approach under the multi-task learning framework, which
leverages the pre-created utterance as reference other than the gold response
to relief the one-to-many problem. Specifically, RADE explicitly compares
reference and the candidate response to predict their overall scores. Moreover,
an auxiliary response generation task enhances prediction via a shared encoder.
To support RADE, we extend three datasets with additional rated responses other
than just a golden response by human annotation. Experiments on our three
datasets and two existing benchmarks demonstrate the effectiveness of our
method, where Pearson, Spearman, and Kendall correlations with human evaluation
outperform state-of-the-art baselines.Comment: 19 pages, Accepted by ACL2023 main conferenc
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Large Language Models (LLMs) have demonstrated a remarkable ability to
generalize zero-shot to various language-related tasks. This paper focuses on
the study of exploring generative LLMs such as ChatGPT and GPT-4 for relevance
ranking in Information Retrieval (IR). Surprisingly, our experiments reveal
that properly instructed ChatGPT and GPT-4 can deliver competitive, even
superior results than supervised methods on popular IR benchmarks. Notably,
GPT-4 outperforms the fully fine-tuned monoT5-3B on MS MARCO by an average of
2.7 nDCG on TREC datasets, an average of 2.3 nDCG on eight BEIR datasets, and
an average of 2.7 nDCG on ten low-resource languages Mr.TyDi. Subsequently, we
delve into the potential for distilling the ranking capabilities of ChatGPT
into a specialized model. Our small specialized model that trained on 10K
ChatGPT generated data outperforms monoT5 trained on 400K annotated MS MARCO
data on BEIR. The code to reproduce our results is available at
www.github.com/sunnweiwei/RankGP
- …